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Abstract: We establish exponential bounds for the hypergeometric distribution which include a finite 
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1. Introduction and overview 

In this paper we derive several exponential bounds for the tail of the hypergeometric distribution. This 
distribution emerges as an extreme case in the setting of sampling without replacement from a finite 
population. We begin with a description of this setting. Consider a population C containing N elements, 
C := {ci,.. ., cjv}) with Ci £ R. Let N = |C'| denote the cardinality of this set, a the value of the 
minimum element, b the value of the maximum element, and fi := (N _1 )(X]i=i c i)i the population mean. 
Let 1 < i < n < IV, and AQ denote the i th draw without replacement from this population. Finally, let 
S n := -X denote the sum of this sampling procedure, and let X n := S n /n denote the sample mean. 

R. J. Serfling obtained the following bound. 

Finite Sampling Bound 1. (Serfling [19]) For 1 < n < N, S n the sum in sampling without replacement, 
and A > 0: 

_ / 2A 2 A 

p > A) < exp ( - (1 _ fi)(b _ a)2 j (1.1) 


where f* := (n — 1 )/N. 
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This result applies to sampling without replacement from any finite bounded population. Let D,N £ N 
such that D < N. Then as a special case we may apply the bound to a population of N elements containing 
D l’s and N — D 0’s. Note that in this specific case S n =: S u ,d,n ~ Hypergeometric(n, D, N). 

For the hypergeometric distribution, the following facts are well known: 


P(S n = k ) 
E(S n ) 


Var{S n ) 




max{0, n — (N — D)} < k < min{.D, n }, 




n — 1 A 
N - 1 ) 


nfiD,N{ 1 ^ PD,n){ 1 — fn ) 


( 1 . 2 ) 


with the final line defining bd,n '■= D/N and f n := (n — l)/(iV — 1). Applying Serfling’s result to the case 
of the hypergeometric distribution immediately gives 

/ 2A 2 A 

P ( y/n(x n - HD tN ) > A) < exp {- _ ) (1.3) 

since (b — a) 2 = (1 — 0) 2 = 1. Comparison of the factor 1 — /* in Serfling’s bound to the factor 1 — f n in 
(1.2) suggests the following question: can Serfling’s bound be improved to 

f 2A 2 A 

P - C) > a) < exp (~ (1 _ fn)(b _ a) , ) <r4) 

in general, or at least in the special case of the hypergeometric distribution? 

To date, the improvement conjectured in (1.4) has not been obtained. For the special case of the 
hypergeometric, Hush and Scovel derived the following bound by extending an argument given by Vapnik. 
See [12] and [23]. 

Hypergeometric Bound 1. (Hush and Scovel, [12]) Suppose S n ~ Hypergeometric(n,D,N). Then for 
all A > 0 we have 


P (Vn(X n - p D ^ N ) > A) < exp (-2a„ i£) , A r(nA 2 - 1)) (1.5) 

where 

an ’ D ' N := ( n + 1 + N-n+l) V (t> + 1 + N-D + l) ' 

More recently, Bardenet and Maillard have improved a deficiency in Serfling’s inequality that occurs when 
more than half the population is sampled without replacement by using a reverse-martingale argument. The 
statement here is a specialization of their Theorem 2.4 to the hypergeometric case. See [1] for additional 
discussion. 

Hypergeometric Bound 2. (Bardenet and Maillard [ 1 ]) Suppose S n ~ Hypergeometric(n,D,N). Then 
for all A > 0 and n < N we have 

( 9 A 2 A 

P i 4) < exp ( - (1 _„ /Af)(1 + 1/ „) ) ■ 

We will justify the special consideration given to the hypergeometric distribution relative to the goal of 
obtaining (1.4) by adapting a result of Kemperman [14] to derive a convex order between samples without 
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replacement from populations consisting of elements in [0,1] and the hypergeometric distribution. We will 
then demonstrate how one may use this convex order to obtain exponential bounds for the more general 
problem of sampling without replacement from a bounded, finite population. In doing so, we return to the 
setting of Serfling. In this setting we will consider the variance of the population as well. Anticipating this, 
we conclude the introduction with a specialization of a bound of Bardenet and Maillard which incorporates 
information about the population variance into the bound. 

Finite Sampling Bound 2. (Bardenet and Maillard [ 1 ]) For 1 < n < N, S n the sum in sampling 
without replacement from a population c := {ci,..., Cn}, S £ [0, 1 ] , and A > 0, we have 

P (v^(X„ - „) > A) < exp (- 2{y2 + (2/3)( A t _ n)wvg)) ) + S (1.6) 


where 


N 


N 


i— 1 


i=l 


a := 1 5^°* ’ b := Ci , f* ■■= (n - 1)/N , p := {1/N)^2a , cr 2 := (1/N) - pf , 

l<i<N l<i<N z —' z ' 

7 2 := (1 -/*)(j 2 +/*c n _i(<5) , and c n (S) := a(b - a )i 


21og(l /S) 


2. Exponential Bounds 

Binomial distributions arise when sampling with replacement from a population consisting only of 0’s and 
l’s. As we saw in the introduction, hypergeometric distributions arise when sampling without replacement 
from such populations. Intuitively, sampling without replacement is more informative than sampling with 
replacement: when items are not replaced, eventually, when n = TV, the entire population is sampled. This 
being the case, it is natural to guess that upper bounds which apply to binomial tail probabilities will also 
apply to the hypergeometric tail probabilities. 

Hoeffding [10] proved that this guess is true for exponential bounds derived via the Cramer - Chernoff 
method. This is because a convex order exists between samples with and without replacement (Hoeffding 
proves this order in his Theorem 4). Convex orders between a variety of sampling plans were subsequently 
explored by Kemperman [14] and Karlin [13]. 

Note that by Ehm (Theorem 2 [4]; see also Holmes, Theorem 3.2 [11]), the total variation distance between 
the hypergeometric distribution P^d^n an d the binomial distribution P l ”' r fj/ N satisfies 

dTv{P h n y E%P b n in D/N) < ^(1 - (D/Nr+1 - (1 - D/N) n+1 )^ < 

so we expect that the binomial bounds will be essentially optimal when (n — 1)/(1V — 1) —► 0. 

Here we are interested in sampling scenarios in which (n— 1 )/(N — 1) yb 0. Given the similarity in scenarios 
that produce binomial and hypergeometric probabilities, one might expect that the binomial exponential 
bounds provide a clue to the form that hypergeometric exponential bounds will take: hypergeometric bounds 
look like binomial bounds, with a finite sampling correction factor included. Indeed, this is the case when 
we compare the bound of Serfling (1.1) to Hoeffding’s uniform bound (Theorem 2.1 [10]), since the only 
difference between the two is the quantity 1 — /*. We therefore state several exponential bounds which apply 
to the binomial distribution. 
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Binomial Bound 1. (Leon and Perron [15]) Let n > 1, p £ (0,1), A < \fn/2, and Xi ,.... X n be 
independent Bernoulli(p) random variables. Then 


P{Vn{ X n 


p) > A) < 




( 2 . 1 ) 


The second bound was established by Talagrand [21, pp. 48-50]. The statement here is taken from van 
der Vaart and Wellner [22, pp. 460-462]): 

Binomial Bound 2. (Talagrand [21]) Fix po and consider p such that 0 < po < p < 1 — po < 1- Suppose 
for neN that X\,... ,X n are i.i.d. Bernoulli(p) random variables. Then there exist constants K\ and K 2 
depending only on po, such that: 

( i ) For all A > 0, P ( \fn{X n — p) = A) 


K 1 

< — exp ( - 
'Jn 




2A + — 
4 n 


(ii) For all 0 < t < A, 


P (Vn(X n - p)>t ) 


K 2 

< x exp 


2A 2 + 


\ 4 "i 


4n 


exp (5 A [A — t ]) 


(Hi) For all A > 0, P (y/n(X n — p) > A) < ^ exp 2A 2 + ^ 


( 2 . 2 ) 


Another well-known exponential bound which applies to sums of independent random variables (and 
consequently the binomial distribution) was discovered by Bennett [2]. Bennett’s bound incorporates 
information about the population variance, and so obtains notable improvements when the population 
variance is small. This statement of Bennett’s inequality specialized to the binomial setting is adapted 
from Shorack and Wellner [20]. 

Binomial Bound 3. (Bennett [2]) Let X±,... ,X n i.i.d. Bernoulli(g), with 0 < g < 1/2. Then for all 
A > 0 

P(vM(X„ -„) > A) < exp ( ^(i - „) )) < 2 ' 3 > 

where ip( A) := (2/A 2 )/i(1 + A) where h( A) := A(logA — 1) + 1. 

Inspecting the form of (2.1), (2.2), and (2.3), we notice that when these bounds are compared to the 
hypergeometric tail bound (1.3) obtained from Serfhng’s bound they do not take advantage of the finite 
sampling setting. Our hope then is we can derive probability bounds which look like the preceding binomial 
expressions, but improved by a finite-sampling correction factor. Such improved bounds exist and are the 
main results of this paper. Their statements follow. 

Theorem 1. Suppose S n ~ Hypergeometric(n , H, N). Define g := D/N, and suppose N > 4 and 2 < n < 
D < N/2. Then for all 0 < A < yfn/2 we have 


P(Vn(X n ^g) > A) <^j 


2tt\ 2 


exp 



T 2A A f N — uf 2y/n\ 


— 2X J \N — n — 2^/nX 


_lf n ,J \ A 4 

3 \ (TV — n) 3 J n 


(2.4) 
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Theorem 2. Suppose Xj ~ Hypergeometric(n, D,N). Define if) := n/N and g ■.= D/N, and let 

n < D. Fix Bo, tpo > 0 such that 0<po<p<l~Bo<l and 0 < tfo < if < 1 — tfo < 1. Then there exist 
constants Ki,K^ depending only on go o-nd ipo such that: 


( i ) For all A > 0, P(\/n(X n 


g) = A) < —L exp 




(ii) For all 0 < t < A, 



(iii) For all A > 0, 


P(V^(Xn 


t ) > A) < 


Ko 


■ exp 




(2.5) 


We are also able to obtain an analogue of Bennett’s inequality by using an important representation of 
the hypergeometric distribution as a sum of independent Bernoulli random variables with different means. 
This representation results from a special case of results established by Vatutin and Mikhailov [24] (also see 
Ehm [4], Theorem A, and Pitman [17]). 

Hypergeometric Representation Theorem 1. If 1 < n < D A (N — D), then 


S n ,D,N =d^^Yi (2.6) 

i —1 


where 1) ~ Bernoulli^/) are independent. 

We may use this representation along with Bennett’s inequality to obtain a Bennett-type exponential 
bound for Hypergeometric random variables (this bound was also discussed earlier in [7], though without 
proof). The proof of this claim is short, so we will provide it here. 

Theorem 3. Suppose S n ^n,N ~ Hypergeometric(n, D, N) with 1 < n < D A ( N — D ). Define bn '■= D/N, 
t t:= bnO- ~ I^n), and 1 — /„ := 1 — (n — 1)/(1V — 1) is the finite-sampling correction factor. Then for all 
A > 0 

- m) > A) < ex P (^,(1 _/„,)) < 2 ' 7 > 

where 4>{X) := (2/X 2 )h(l + A) and h{X) := A(log A — 1) + 1. 
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Proof. Under the hypotheses it follows from (2.6) that 

/ n \ 

P (Vn(X n!D , N - /j, N ) > A) = P n~ 1/2 ^(li - Hi) > A 


i— 1 


< exp 


= exp 


= exp 




u 

A 2 


A • n-V 2 

sr Kiti-Tu) 


^ ra/xjy(l 


-Miv)(l-/r.) 




A • n- 1 / 2 


^ n/i N (l 




1 2cr w(l fn) 





( 2 . 8 ) 


Note that (2.8) follows by applying Bennett’s inequality (his general inequality, rather than the binomial 
specialization), which is applicable since each Yj is independent Bernoulli(^i) and hence Y t — p,, < 1 a.s. for 
1 < i < n. This gives the bound. □ 

Since ip(v) > 1/(1 + v/3) for all v > 0 (Shorack and Wellner [20], proposition 1, page 441), Theorem 3 
immediately yields following Bernstein type tail bound. 


Corollary 1. With the same assumptions and notation as in Theorem 3, 


P{Vn(X n ,D, N - Hn) > A) < exp- — -——— j- . (2.9) 

Detailed proofs of the bounds (2.4) and (2.5) are provided in section 4. The proofs of these two bounds are 
complicated and do not proceed by the Cramer - Chernoff method. The proof of (2.4) adapts the argument 
of Leon and Perron for the binomial distribution to the hypergeometric case. In adapting the argument, we 
derive an analogue of a well known binomial tail probability bound going back to at least Feller [5, pp. 150- 
151]: see Lemma 7 for details. The proof of (2.5) adapts Talagrand’s argument to the hypergeometric setting. 
The tools developed in the course of the proofs are specialized to the analysis of binomial coefficients. As 
such, they may prove useful in understanding how to analyze the tail of distributions such as the multinomial 
and multivariate hypergeometric by providing guidance for parametrizations which could appear in those 
settings after the application of Stirling’s formula. 

Note that if N /* oo with n fixed, (2.4) yields a slight improvement of (2.1), the bound of Leon and 
Perron, since it contains a quartic term in the exponential. Recovery of this sort is exactly the behavior 
we would expect in the limit, since (2.1) bounds binomial probabilities and as N /* oo with n/N —> 0 the 
hypergeometric law converges to the binomial. A similar limiting argument shows we may recover (2.2) from 
(2.5) as well as (2.3) from (2.7). 

Also observe that the bounds (2.4) and (2.5) contain terms involving 1 — n/N, which incorporates 
information about the proportion of the population sampled into the bound. This sampling fraction is sharper 
than the improvement conjectured in Serfling’s bound: 1 — n/N < 1 — (n — 1)/{N — 1) < 1 — (n — 1)/1V. 
For A > (i Jn(N — n))/(2(2N — n )), the expression outside the exponential terms in (2.4) exceeds the non¬ 
exponential expression in (2.1). However, for such A the increase in magnitude is compensated for by the 
1 — n/N term appearing in the exponent. 

Figure 1 demonstrates the benefit of including a finite-sampling correction factor inside the exponential 
term: when enough of the the population is sampled, the difference between the binomial and hypergeometric 
bounds can differ by as much as 1/4 for specific deviation values. Figure 2 compares the performance of 
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the new hypergeometric bounds to each other and to the bounds of Serfling (1.1) and Hush and Scovel 
(1.5). It also provides some insight as to when (2.4) out-performs (2.7) and vice-versa. The finite-but- 
unspecified constants appearing in (2.5) prevent its inclusion in the figures. Additionally, the constants are 
not immediately comparable to those in (2.2) because they depend on how one chooses to truncate the 
sampling fraction and population proportion. The bound (2.5) demonstrates that the factor 1 — n/N in the 
exponential may apply for all A > 0 as long as a suitable leading constant is selected. 

Clratterjee [3] used Stein’s method to derive very general concentration bounds for statistics based on 
random permutations. For example, here is a restatement of his Proposition 1.1: let {ay^ : 1 < < N} 

be a collection of numbers in [0,1] and let S = a i,-n(i) where n ~ uniformly on all permutations of 

{1,...,IV}. Then 

P{\S-E(S)\ >t)< 2exp ^- AE ^ + 2 t ) for a11 1 > °' 

The statistic S was first studied by Hoeffding [9]. The special case which yields the setting of Serfling’s 
inequality is oqj := 1 u<n] c j f° r 1 < i,j < N where 1 < n < N. Then S = c ir(i) = where 

S n = Y^i= l Xi is as defined in the first paragraph of section 1 above. In this special case E(S) = ticn = n /1 
and Chatterjee’s (Bernstein type) bound becomes 

P(„-'/i(S n - rn) > A) < exp (-j-A—j (2. 10 ) 


for all A > 0. 

Goldstein and I§lak [6] recently used a variant of Stein’s method to give another inequality for the tails 
of Hoeffding’s statistic S: 


P(\S-E(S)\ >t)< 2exp ( - nf 2 


2(a\ + 8\\a\\t) 


where ||a|| = max, j< N \a it j - a*.|, 


N 


N 


1 


N 


a *' ~~ ’ a ^ ’ a ~~ N 2 ai r anc ^ 

i=i »=1 i,j —i 


oi = 


1 


—r ^2 ( a b ~ ai ■ ~ a i + a ') 2 ’ 


N- 1 


i,j<N 


( 2 . 11 ) 


Specializing (2.11) to the setting of Serfling’s inequality (with dij := lu< n iCj) yields 

A 2 /2 


P(n 1 ^ 2 \S n — ncjv| > A) < 2 exp 


o-cC 1 - fn) +8||c||A/Vn 


( 2 . 12 ) 


where cr 2 = N 1 Y2f=i ( c j ~ °n) 2 and ||c|| = max 2 <jv \cj — cjv|- This Bernstein type bound is in the same 
setting as Serfling’s inequality, but the bound has an explicit dependence on cr 2 . This is similar to the bound 
of Bardenet and Maillard (1.6) which incorporates variance information through the parameter r y 2 . 

Further specialization of (2.12) to the (one-sided) hypergeometric setting (with Cj = 1 {j < D} for 
j = 1,...,N) yields 


P(n~ 1/2 (S n - n(D/N)) > A) 

( A 2 /2 

“ 6XP V (D/N)(l - D/N)( 1 - f n ) + 8 {(D/N) V (1 - D/N)}\/y/E 

( _AV2_A 

Cxp V ^(l - fn) + 8{/Kjv v (1 - fJ, N )}X/y/nJ ' 


(2.13) 
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This bound differs from the bound given in (2.9) (the Bernstein type corollary of Theorem 3) only through 
the second term in the denominator inside the exponential: note that 8 {fiN V (1 — Bn)} > 4 > 1/3. 

Comparing the Bernstein type bounds (2.12) and (2.13) to Serfling’s inequality (1.3) with b = 1 and 
a = 0, we see that the bound of Goldstein and I§lak is smaller than Serfling’s bound when A < y/n / (32 (cn V 
(1 — cjv)))( 1 ~ 4(7^ + 4cr^/„ — /*). Similarly, we see that Chatterjee’s bound (2.10) is smaller than Serfling’s 
bound only if cn < (1 — (ra — l)/iV)/8 and then A < y/n(l — {n — 1)/N — 8cat)/4. Figure 3 gives a comparison 
of Serfling’s bound, Chatterjee’s bound, Bardenet and Maillard’s bound (1.6), Goldstein and I§lak’s bound 
(2.13), and the Bennett type bound (2.7) in the further hypergeometric special case with n = 100, N = 2001, 
and D £ {101, 200}; note that in the case D = 200, cn = D/N ss .10 so 8 cjv ~ .8 while l — (n—l)/N) « 1—.05 
so the first condition holds and then Chatterjee’s bound should win approximately when A < y / n(.15)/4 ss 
1.5/4. 

Comparing (2.13) to (2.10), we find the Goldstein-I§lak bound improves Chatterjee’s bound when A < 
y/n(2cN — Ojy( 1 — /n))/(8(cjv V (1 — cn)) — 1)- In figure 3, this region is approximately equal to A < 0.08 
when D = 101 and A < 0.18 when D = 200. From Figure 3(b) we see that the improvement of Goldstein and 
I§lak’s bound to those of Chatterjee and Serfling is very small in this region. From Figure 3(a) we see that 
Chatterjee’s bound is smaller than both the Goldstein and I§lak bound (2.13) as well as Serfling’s bound, 
when D/N is small and 0.08 < A < y / n(.55)/4 ss 1.37, but that all three are improved by Bardenet and 
Maillard’s bound (1.6) and the Bennett type bound (2.7). 

3. Convex Order for the Hypergeometric Distribution 

When sampling without replacement from a finite population concentrated on [0,1], the hypergeometric 
distribution occupies an extreme position with respect to convex order. This extreme position offers additional 
reason to give the hypergeometric distribution special consideration, since we might hope to adapt bounds 
for its tail to the tails of the random variables it dominates through the convex order. 

The extreme position of the hypergeometric distribution was essentially proved by Kemperman [14]. In his 
paper Kemperman studied (among many other things) finite populations majorized by nearly Rademacher 
populations; through transformation, this describes the hypergeometric setting. We say nearly Rademacher 
since Kemperman’s analysis resulted in majorizing populations consisting entirely of — l's and l's with the 
exception of a single exceptional element a with — 1 < a < 1. 

Here, we revisit his argument, modified so it applies to a population with elements between 0 and 1. 
We then provide an extension of the argument in order to obtain a hypergeometric population which sub¬ 
majorizes this initial population. Since the extension follows naturally from Kemperman’s majorization result, 
we begin with his procedure here. We start with relevant definitions from Marshall, Olkin, and Arnold [16]. 

Definition 1. For a vector x = (aq,..., Xn) £ R^, let 


X[i] > X[2] > ■■■> a: [at] 


denote the components of x in decreasing order. 

Definition 2. For x,y£ R^, 


if 


k -- \-~\k 

Z^i=l x [i\ — 2-,i =1 V[i] 
\~~\N \~~\N 

x [i\ — 1 U[i] 


k — 1, 


,N- 1, 


where x -< y is read as ‘lx is majorized by y”. 
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Bin, n = 250 
Hg, n = 250 
Bin, n = 1000 
Hg, n = 1000 
N = 2001 


A = 1/4 
A = 1/3 
A = 1/2 
A = 1 
N = 2001 


Fig 1: Comparison of Leon and Perron’s binomial bound (2.1) to the new hypergeometric bound (2.4). In 
sub-figure la, the sample size n is set to 250 and 1000 for both bounds. The population size N is taken to 
be 2001 in both cases. In the legend, lines with the description “Bin” correspond to the binomial bound of 
Leon and Perron (2.1), while lines with the description “Hg” correspond to the new hypergeometric bound 
(2.4). In sub-figure lb, we plot the difference between Leon and Perron’s binomial bound (2.1) to the new 
hypergeometric bound (2.4) at the fixed deviation-values A £ {1/4,1/3,1/2,1}. We let n vary between 10 
and 1000 to illustrate the impact of introducing the finite-sampling correction factor into the exponential 
term of the probability bound. 
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- Serfling 

Hush Scovel 

- Theorem 1 

Theorem 3 
n = 100 
D = 200 
N = 2001 

0.0 0.2 0.4 0.6 0.8 1.0 

A 

(a) 2a 


- Serfling 

Hush Scovel 

- Theorem 1 

Theorem 3 
n = 100 
D = 500 
N = 2001 


0.0 0.2 0.4 0.6 0.8 1.0 

A 

(b) 2b 


o 

CO 


D 

>> 

03 

-Q 

O 

CL 




Fig 2: These plots compare the various exponential bounds for the Hypergeometric distribution. In these 
plots we fix the population to N = 2001, and the sample size to n = 100. The plots consider a setting with 
smaller variances by setting D = 200 in the first plot (so D/N = 1/10) and D = 500 in the second (so 
D/N = 1/4). We see that the bound of Theorem 1 (2.4) performs comparably with the bound of Theorem 3 
(2.7) in the setting D = 200 (2a), and surpasses it when D = 500 (2b). This suggests that when D/N < 1/10, 
the bound of Theorem 3 will perform better than the bound of Theorem 1, and when 1/10 < D/N < 1/2, 
the converse. 
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Serfling 
Chatterjee 
Theorem 3 
Bardenet-Maillard 
Goldstein-lslak 
n = 100 
D = 101 
N = 2001 


(a) la 



Serfling 
Chatterjee 
Theorem 3 
Bardenet-Maillard 
Goldstein-lslak 
n = 100 
D = 200 
N = 2001 


(b) lb 

Fig 3: Comparison of Serfling’s bound (1.3), Chatterjee’s bound (2.10), and the bound of Goldstein and Islak 
(2.13), Bardenet and Maillard’s bound (1.6), and Theorem 3. In sub-figure 3a, the sample size is n = 100, 
the population size is N = 2001, and the number of successes is D = 101. In sub-figure 3b, the sample size 
remains n = 100, the population size remains N = 2001, but D = 200. 
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Fig 4: The initial population line in the display coresponds to c = {0, 1/14,2/14,13/14,1}. The 
majorizing population contains seven (/s, seven Ts, and a single exceptional element of 1/2. The sub¬ 
majorizing population contains seven 0's and eight l's. In the display, each population is sorted in decreasing 
order; the corresponding lines show the cumulative sum of the ordered population elements. 


Definition 3. For x,y£ M. N , 


k k 

if ], k = l,..., N 

i—l i—1 

where x -< w y is read as “x is weakly sub-majorized by y” or, more briefly, “x is sub-majorized by y”. 

Figure 4 provides an illustration of these definitions. In the following Lemma, we re-state Kemperman’s 
procedure so it constructs a majorizing hypergeometric population. See section 4, pages 165-168 in [14] for 
the original Rademacher argument. 

Lemma 1. (Kemperman [14]) For any finite population x £ M. N , such that 0 < Xi < 1 for all 1 < i < N, 
there exists a population c £ W N , consisting only ofO’s, l’s, and at most a single element between 0 and 1, 
which majorizes the original population. In fact, c consists of D 1 ’s, N — D — 1 0 ’s, and a number a £ [0,1) 
where D and a are determined by D = |/Vxaj-J, and a = Nxjy — D. 

Proof. We update Kemperman’s argument, and demonstrate his modified algorithm described in the display 
produces the population claimed by Lemma 1. Suppose first that x £ R 2 , and 0 < X\,X 2 < 1. Identify 
ipop = x in the algorithm description. Then mpop = x, plen = 2, and plen —1 = 1. Hence, the “for” loop 
executes exactly once. 

Consider the operations in the “for” loop. We compute csum = mpop[l\ + mpop[2] = X\ +X 2 - If csum > 1, 
the first condition is met, and we set mpop[ 1] = 1 and mpop[ 2] = csum — 1. Since csum. = x± + X 2 , and 
0 < X\, X 2 < 1 by assumption, we have 1 < csum < 2. Hence 0 < csum — 1 < 1, and so 0 < mpop[ 2] < 1. 
Observe, that mpop[ 1] = 1 > X\ V X 2 , mpop[ 1] + mpop[ 2] = 1 + (x\ + X 2 — 1) = x\ + X 2 , so that mpop now 
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Algorithm 1 Kemperman’s majorization algorithm 

1 

function Majorize (ipop) 

> ipop is the input population 

2 

plen «— length (ipop) 


3 

mpop +- ipop 

> Make a copy of the input population to transform 

4 

for i G {1, ... ,plen — 1} do 


5 

csum «— mpop[i] + mpop[i + 1] 


6 

if csum > 1 then 


7 

mpop[i] +- 1 


8 

mpop[i + 1] <— ( csum — 1) 


9 

else 


10 

mpop[i] <— 0 


11 

mpop[i + 1] <— csum 


12 

return mpop 

> mpop is now transformed into the desired population 


majorizes x as claimed. If csum < 1 , the algorithm sets mpop[ 2] = x-\ + X2, and mpop[ 1 ] = 0. Again, this 
satishes the description in the lemma, since 0 < csum = X1+X2 < 1 . Moreover, mpop[ 2] = X1+X2 > XiV X2 
and mpop[ 2] + mpop[ 1] = X\ + X2 + 0 = X\ + X2, and so again mpop majorizes the population x. This 
completes the base case. Observe that the exceptional element is in the final index of the vector. 

For the inductive case, suppose Kemperman’s algorithm works when n = N. We will show it holds for 
n = N+l. Let x € R iv+1 be a population whose elements are all between 0 and 1. Let y £ be constructed 
so that pi = Xi for 1 < i < N. Run the algorithm on y. By the induction hypothesis, this produces a vector 
m € M. N , such that to, £ {0,1} for 1 < i < N — 1. Moreover, 0 < to at < 1 by the induction hypothesis, and 
also m majorizes y. 

Next, construct a vector aeR 2 such that ai = to at and ci 2 = Zjv+i- Run the algorithm on a. By the base 
case, this produces a new vector bel 2 such that b\ £ {0,1} and 0 < 62 < 1. Note also that b majorizes a. 

Finally, construct a vector c £ ]R Ar+1 such that Cj = m; for 1 < i < N — 1, cat = 61 and cat+i = 62 - 
By construction, we have that C; £ {0,1} for 1 < i < N and 0 < Cn+i < 1. Hence, if we can show that c 
majorizes x we are done. We first show that 


JV+l JV+l 


Using the constructions, we have 

\~N-1 


JV+l 

E c * = 

i=1 


£ c * 

. i =1 

‘ N 

E 

.i =1 


+ CN + CAT+1 — 


N-l 

£ 

. i =1 


rrii 


+ hi + 6 2 = 


'N-l 

£ 


i=1 


+ m N + xn+i 


+ %N+1 ~ 


‘ N 

£y* 

.i=l 


+ %N+1 — 


■ N 

x> 

.i =1 


N+l 

+ %N+1 — Xi 
i— 1 


and so the summation claim holds. Next, pick 1 < k < N + 1. Then 

k k 


£ c w >£ 


XI 


i=l 


i—1 


since by construction, Ci £ {0,1} for 1 < i < N. As 0 < x* < 1 for 1 < i < N + 1, if k is small enough so 
that 

k N+l N+l 

£ c [»] < £ C M = £ Ci > 
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then all terms in the summation 

k 

i =1 

must be 1, and so are greater than or equal to the corresponding a?’s. If k is large enough so that 

k N +1 

c [i\ = yz C{ 

i= 1 i—1 


(which means the remaining N + 1 — k elements are all 0), then we also have 


k k 


5>i ^ 

i=l i=l 


since we have already seen that the sum over all the x’s equals the sum over all the c’s. As all the x’s are 
between 0 and 1 by assumption, this proves the claim. D : 

Lemma 2. For any finite population x £ M. N , such that 0 < Xi < 1 for all 1 < i < N, there exists a 
population z £ WL n , consisting only of 0 ’s and 1 ’s, which suh-majorizes the original population. 

Proof. Consider a finite population x £ KA which obeys the hypotheses. Using Lemma 1, we may construct 
a population y £ R w which majorizes x. By Lemma 1, we know that y consists only of 0’s, l’s, and at most 
a single exceptional element j/jv between 0 and 1 . 

If the exceptional element is either exactly 0 or exactly 1, we are done. So, suppose 0 < yjv < 1. Create 
a new population z £ such that z % = yi for 1 < i < N — 1, and zn — 1. This population z then 
sub-majorizes y and hence sub-majorizes x, completing the proof. □ 

Lemma 3. Suppose x £ is a population consisting only of 0 ’s, 1 ’s, and a single exceptional element, x\, 
such that 0 < Xi < 1. Suppose y £ R N is a population whose elements are the same as those in x, except 
?/i = l and so y\ > X\. Let X\, ..., X n denote a sample without replacement from x, and Y\, ..., Y n denote 
a sample without replacement from y, 1 < n < N. Finally, suppose <f> is a continuous convex increasing 
function on R. Then 


Ecfl^Xi) <E0 [Y,Yi 


Ki= 1 


^=1 


(3.1) 


Proof. We adapt Kemperman’s (1973) argument for Rademacher populations to the current setting of 
hypergeometric sub-majorization. 

Observe that 

E( t> = "py ^(*<1 + --- + x in) 

where the sum is over all sets of indices 1 < i\ < < • • • < i n < N. Note the same holds for sampling 

without replacement from y, with suitable substitution. Therefore 



E(j) 



EcpiY.Xi 


\i=l 


(f>{yi + yi 2 H -ha;»„) -<l>{xi +X i2 H-ba;^)) 


where the sum is over all distinct indices 2 < *2 < *3 < • ■ ■ < i n < N. Note that sets of indices with i\ > 1 
cancel out by definition of the two populations. Since 4> is assumed convex increasing, each term of the sum 
is non-negative. Hence, the entire sum is non-negative as well. This gives the claim. □ 
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We next specialize a proposition stated in Marshall, Olkin, and Arnold [16, p. 455] to the current problem. 
Proof of the general statement given in the text is credited to Karlin; proof for the specific cases of sampling 
with and without replacement to Kemperman. As proof is given in Marshall, Olkin, and Arnold, we simply 
state the result here. 

Lemma 4. Let x £ M. N be an arbitrary finite population. Let y £ M. N be a finite population which majorizes 
x. Let X\,... ,X n denote a sample without replacement from x, and Y\,...,Y n denote a sample without 
replacement from y, 1 < n < N. Finally, suppose (f> is a continuous convex increasing function on R. Then 

<»c(x; y *) ■ 


E<f> [Y.Xi 


Note that Lemma 4 requires majorization between populations. We may combine the preceding lemmas 
to demonstrate the following claim. 

Theorem 4. For any finite population x £ R^, such that 0 < Xi < 1 for all 1 < i < N, there exists a 
population y £ R ff , consisting only of 0 ’s and 1 ’s which sub-majorizes the original population. Let Xi ,..., X n 
denote a sample without replacement from x, and Yj, .Y n denote a sample without replacement from, y, 
1 < n < N. Finally, suppose <f> is a continuous convex increasing function on IR. Then 


Ecf [Y,Xi 


Proof. Suppose x £ is a finite population which satisfies the hypotheses. We may use Lemma 1 to 
construct a population z £ I w such that z majorizes x, and z consists only of 0’s, l’s, and at most a single 
exceptional element between 0 and 1. For 1 < n < N, let Z\, ..., Z n denote a sample without replacement 
from z. By Lemma 4, we then have the order 


e<s> 


(3.2) 


Next, by Lemma 2 we may construct a population y £ R^ consisting only of 0’s and l’s that sub-majorizes 
z. Then by (3.1) we have 


<F4> (JTU J • ( 3 - 3 ) 

Combining (3.2) and (3.3) proves the claim. □ 

Inequality (2.7) provides an opportunity to apply Theorem 4. Recalling the notation of the introduction, 
let c := {ci,..., cat} be a population such that 0 < c* < 1 for 1 < i < N, a = 0, b = 1, and Cjv + 1/-/V < 1/2. 
Using Kemperman’s algorithm, as stated in Lemma 1, there exists a population m := {mi, ... , toat| which 
majorizes c such that ra, £ {0,1} for 1 < i < N — 1, and 0 < mjv < 1- In the following, suppose 0 < tun < 1, 
since if uin = 0 or to at = 1 we can apply (2.7) directly. 

Since c -< m, we have m n = Cn■ Using Lemma 2, there is a population {hi, ..., /iat} with hi £ {0,1} for 
1 < * < TV that sub-majorizes c. By the preceding construction, we have hi = nii for 1 < z < TV — 1, and 
hj\f = 1 > iriN- 

Without loss of generality, re-label m and h so that: for 1 < i < D — 1 we have hi = mi = 1; for i = D 
we have hp = l>TOD>0;for-D + l<i<Afwe have hi = mi = 0. Denote the exceptional element of m 
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by a := mo- With the populations so modified, we derive bounds for the difference between the populations 
means: 


By construction, we thus have 


1 . r _ h D - m D 1-a 

-> h N -m N = —^ = — >0. 


— 1- — 1 1 
m N <h N < m N + — < - . 


(3.4) 


Suppose then that we sample n < D items without replacement from c. Let Xi denote the sample without 
replacement from c, and let Hi denote a corresponding sample without replacement from h. Then for t > 0 


P < inf 


E exp (r Y^j = i Xj) 
r>o exp ( rt + rn/ic) 

E exp (r J27= i H i) 
r> o exp ( rt + rnfic) 


< inf 


inf exp (rn(nh - Be)) 

r> 0 


E exp (r ~ L l h)) 

exp (rt) 


( n\ Eexp(rJ2^i( H i~ Vh)) 

< mf exp r— - 

r>0 V X) 


inf exp [ r— ) 
r>0 V NJ 


Tl 

< inf exp ( r- rt + n 

r>0 V N 


exp (rt) 

n \ E exp (rY^ =1 (Yj - ^i)) 
exp (rt) 


(e r - 1 - r) 


inf exp — — rt + ray 2 (e r — 1 — r)^ 


(3.5) 

(3.6) 

(3.7) 

(3.8) 


where in the final line we write y 2 := (1/n.) ^(l — n i)- The inequality at (3.6) follows by (3.4). The 

inequality at (3.5) follows by Theorem 4. The inequality at line (3.7) follows by Shorack and Wellner page 
852, display (b) [20]. 

At this point we may continue from (3.8) and optimize over r. Doing so yields an optimal choice of 


r* = log 1 + 


Nt — n 
nNj 2 

Using this value, however, yields an exponential bound that is somewhat difficult to compare to (2.7). If 
instead we simply choose 

r 2 = log (1 + —2 
V n 7 , 

we obtain a bound similar in performance to the bound we find using r*, but has the benefit of easy 
comparison to (2.7). The choice rj corresponds to the optimal value of r when the original population is 
majorized by a hypergeometric population. We continue from (3.8) using r|, and obtain 


P \ n Bc>t) < exp ( log ( 1 


vi=1 


t 

717 s 


exp ( —t 


1 + 5 ?>( 1 + £ 


-1 


= ex P (^log(l + ^ 




(3-9) 
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Writing A = t/y/n , and substituting y 2 = (D/N)( 1 — D/N)( 1 — f n ) = cr^(l — /„), we obtain the following 
bound: 


P(v^( X n 


He) > A) < 



A 

y/na N (l - fn) 


n/N 

exp 


2^(1 - fn) 


if 



In this form, the cost of sub-majorization is clear when compared to (2.7): we incur the leading term outside 
the exponent. By shifting and scaling the population, we may use this bound to obtain the following theorem 
for the general problem of sampling without replacement: 

Theorem 5 . Let c := {ci,..., cjv} be a population with a = mini<j<jv c% and b = maxi<j<jv Ci both finite. 
Let d := {(ci — a)/(b — a ),..., (cjv — a)/(b — a)}. Suppose first that d is majorized by a Hypergeometric 
population such that D/N < 1/2. From this Hypergeometric population define <j 2 n := ( D/N)(l — D/N). 
Then the following bound holds for a sample without replacement of n < D items from c: 


P{V^{Xn 


He) > A) < exp 


( 2(b-a) 2 a 2 N (l-f n ) 



A 


eO^M 1 - fn) 


(3.10) 


If instead disr + 1/N <1/2, then the following bound holds for a sample without replacement of n < D items 
from c: 

P(y/n(X n - pL c ) > A) 

/ _A_ \ n/N ( _ A 2 , /_A_ 

V + y/fi(b - a)o N (l - f n )) eXP \ 2{b - a) 2 a 2 N (l-f n ) \ y/n(b - a)o 2 N (l - /„) 

Two-sample rank tests provide an opportunity to explore the behavior of the bounds of Theorem 5. 
Following the exposition in Chapter 4 of Hajek, Sidak, and Sen [8] (with the notation modified), let Y\,... ,Y n 
and Zi,..., Z m be random samples with continuous distributions Fy and Fz- Form the pooled sample 
Y n+ j = Zj, j = 1,..., m, and N = n + m. Let Ri (i = 1,..., N) denote the rank of the observation F) in the 
ordered sequence Y)i) < Y( 2 ) < ■ ■ ■ <Y(N)- To test the null hypothesis H 0 : Fy = Fz against alternatives of 
shifts in location, one may use the Wilcoxon test (see page 96 [8]). The test statistic, expectation under the 
null, and variance under the null are 



n 

Sw '■= 5 ESw 

i -1 


-n(n + m + 1) , and Var(Sw) 


1 

12 


nm(n + m + 1) . 


Under the null, Sw may be viewed as the sum in a sample without replacement from the population 
c w '■= {1,2,..., A^}, where a = 1 and b = N. Shifting and scaling the population produces d^ := 
{0, 1/(N — 1),... ,(N — 2)/(N — 1), 1 } . If N is even, then d\y is majorized by a Hypergeometric population 
containing N/2 l's and N/2 0's, and hence = (D/N)( 1 — D/N) = 1/4. If we additionally assume n < m, 
we may use (3.10) to study its finite sample behavior. Doing so we find for A > 0 


P [Vn (X 




< exp 


( (n + m- 1) 2 (1 - f n )^ (y/n(n + m - 1)(1 - /„))) 


Serfling’s bound (1.1) may be applied in this case as well; through its application we find 


_ (IV+ 1) 

n 2 


> A I < exp — 


2A 2 

(n + m-l) 2 (l- ^=T) 
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Finally, we may use Bardenet and Maillard’s bound (1.6) with 5 = 5f = lx 10 - ' to analyze the situation 
as well. Figure 5 compares the performance of these three bounds when n = m = 250. In this case, we see 
that the bounds are comparable, with Bardenet and Maillard’s bound performing the best, and Serfling’s 
performance superior to (3.10). This occurs because the variance component ( D/N){\ — D/N ) (which is 
close to 1/4 when n = m = 250) that appears in the bound is the variance of the majorizing liypergeometric 
population rather than the variance of the shifted and scaled population (which is close to 1/12 when 
n = m = 250). Bardenet and Maillard’s bound performs well because it incorporates information about the 
variance of the untransfornred population into its bound. 

Another example is found in the Klotz test, which is used to test the null Hq : Fy = Fz against alternatives 
of differences in scale (see page 104 [8]). Recalling N := n + m, the test statistic, expectation under the null, 
and variance under the null are 


Sk ■= Y 


i= 1 


$ 


-11 


Ri 


N + 1 


N 




»=i L 




N + l 


and 


N r 




$ 


-1 


N + l 


i 4 


m 


i(N-l) 


(■ ES k ) 2 ■ 


Defining the population 


r 

r / ; M 

2 1 

Cj := 

$_1 (at 1 ) 

, 1 < i < N) 

l 


~ ~ 1 


we may view Sk under the null as the sum in a sample without replacement from c k- If n + m = 500, we may 
compute c K , and find a ~ 6.26 x 10 -6 and b ss 8.29. Shifting and scaling the population produces d A -, which 
is bounded by 0 and 1. This population is majorized by a population containing 59 l's, 440 0's, and a single 
exceptional element approximately equal to 0.044. Hence, it is sub-majorized by a population containing 60 
l's and 440 0's. Supposing that n = 60 and m = 440, we may use (3.11) to analyze this scenario since the 
mean of the sub-majorizing population is 3/25 (also note ( D/N)(l — D/N ) = 66/625 in this case). Doing so 
(with the conservative approximation that b — a ~ 8.29), we find for A > 0 that 


P(V 60(X 6O - hk) > A) 

“ ( X + ^60(8.29)(66/625)(440/499)) ' Lxp ( 2(8.29) 2 (66/625)(440/499) ^ (\/60(8.29)(66/625)(440/499)) 

Once again, we may apply Serfling’s uniform bound. Doing so here, we find 

P (v'SOOm - «) > A) < exp (- (441/5 po )(829 p ) • 

( 3 . 12 ) 

As in the Wilcoxon example, we may use Bardenet and Maillard’s bound (1.6) with S = Sf = 1 x 10~' to 
analyze the situation. Figure 5 also compares the performance of these three bounds for the special case 
n = 60 and m = 440. In this case, we again see that Bardenet and Maillard’s bound performs the best, but 
that the bound obtained via sub-majorization now improves on Serfling’s result. This is because the variance 
component of the sub-majorizing hypergeometric population, (D/N)(l — D/N) = 66/625 = 0.1056 < 1/4 
reflects some of the variability of the untransformed population. However, the untransfornred population (Ik 
has variance a 2 « 0.0258; this variability is captured in the bound of Bardenet and Maillard, and so we see 
the improved performance. 
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Thus we see that (sub-)majorization, as a strategy for finding exponential bounds which incorporate 
information about the population variance in the problem of sampling without replacement from a 
bounded finite population, can produce sub-optimal results. As we saw, this is because the (sub-)majorizing 
hypergeometric population can be more variable than the underlying population from which we sample. 
However, if our goal is to find uniform exponential bounds, this information loss is immaterial: such bounds 
apply to all underlying populations, regardless of their variability. Hence the analysis of the hypergeometric 
distribution which produced Theorems 1 and 2. We turn to the proofs of these bounds in the concluding 
section. 


4. Proofs of the Bounds 


Our proofs depend on a version of Stirling’s formula from Robbins [18]. 
Lemma 5. For n £ No 


/ n\ n i . 

r- - (n\ n 1 

1 - 1 e 12 "+! < n! < \ 

'27rn — e 12 ’ 

v e) 

V e) 


(4.1) 


To prove (2.4), we will need some additional tools. We start with the following lemma. 


Lemma 6. Suppose S n ~ Hypergeometric(n, D , N) with 1 < n < D < |_A^/2J and 1 < k < n — 1. Then for 
k > n(D/N) we have 


P(S n =k)< 


yjD{N — D)n(N — n) 


VTk y/k(D - k)(n — k)(N — D — (n — k))N 


■ exp 


2 nN 


u exp - - 1 + 


(N - n ) 3 


N — n 

Proof. The proof follows by direct analysis. Using Stirling’s formula (4.1), we have 

(D\ (N—D\ 
z)\n—k) 


P(S n = k) = 


o 


< 


1 


y/D(N - D)n(N — n) 


y/k(D - k)(n - k)(N - D - (n - k))N 
D d (N - D) N ~ D n n {N - 

k k (D - k) D ~ k {n - k) n ~ k (N -D-(n- k)) N ~ D P n ~ k ) N N 


exp 


(ii 


12 D ^ 12 (N-D) ^ 12 n ' 12(N-n) 


exp 


+ 


12fc+l ' 12(D-k) + l ' 12(n-k)+l ~ 12(N-D-{n-k)) + l ^ 121V+1 

=: A ■ B ■ C . 


(4.2) 


(4.3) 
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Theorem 5 bound 
Serfling 

Bardenet-Maillard 
n = 250 
m = 250 
N =500 


0 100 200 300 400 500 

t 

(a) la 


Theorem 5 bound 
Serfling 

Bardenet-Maillard 
n = 60 
m = 440 
N =500 


0 2 4 6 8 10 

t 

(b) lb 

Fig 5: Comparison of the bounds of Theorem 5 to Bardenet and Maillard’s bound (1.6) and Serfling’s bound. 
The first sub-figure (5a), corresponds to the Wilcoxon example. The second sub-figure (5b), corresponds to 
the Klotz example. 
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We consider the B term first. Define u := k/n — fi, recalling ^ := D/N. We then have 
B = 


D d (N-D) n ~ d /N n 

( L n) k 

■(i - ■ i 

(«> 

/ \ D—k / \ N—n— (D—k) 

( D-k \ (-1 D-k \ v ' 

^ N—n) N—n) 

D {1 _D) N - D 


(i 

(«) D 

\ D—k / \ N—n—(D—k) 

-k\ /i D-k \ K ' 

—n J N—n) 

(§) 

(SS)"* ( 

(D^ D ~ k 

(i - er" a - 


1 - M 


N-n-(D-k) 


U + n 


k / N—r 

I JV 


D-k 




n—k 


N-r 

N 


N-D-(n-k) 


N-n-(D-k) 

N-D 


= exp(—/i)) 


'N—r. 
< N 


v N—n 


' P-k \ D ~ k ( N-D-{n-k) \ N ° (n fe) 

= exp(-n\E'(zi,/u)) • £? 2 


m" * /J 

D ) \ N-D 


where the first factor corresponds to the same function as in Talagrand’s argument for the binomial 
distribution [21, pp. 48-50] and we recall 

^(u, n) ■= (u + n) log + (1 - (u + /x)) log ^ 1 + ^ 


1 — At 


Now, we can further re-write as 


Bo = 


N-n 

N 

D-k 


D-k 


N-n 

N 


N—D—(n—k) 


N-n-(D-k) 

N-D 


=: exp(-r) 


where 


r = - iog(s 2 ) 

= (D-k) log 



(^) 

1 

i- 

N—n 

N 

1 


= (N - n) ( %—- ) log 


+ [N — n — (D — k)] log 

(D - k)/D 


/ N-n-(D-k) \ 

V N ~ D ) 




N — nJ |_ (N — n)/N 




[N - n — (D — k)]/(N - D) 
(N-n)/N 


Now k = n(u + At), so 


——fl— = M + M) = M 1 - - 0/^0“ • 
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and 


(D — k)/N /x(l — (n/N)) — (n/N)u 


(N — ri)/N 


1 -n/N 


= ^ 


%/N 


1 — n/N 


Thus we also have 


„ ID — k) /N „ n/N 

1 — ttt - , ,,. — 1 — /x T -- ——~n . 


(.N -n)/N ^ 1- (n/N) 

Thus it follows that, with f = f N ■= n/N, f = f N := 1 — /at, 


^ = ("-7" 1 106 


f \ i 

B - =u - 

/ / M J 


+ ( 1 - A* + =u ) lo g 


i ^ 1 

1 — H+ —u 


f J 1 - 


= \ =u,l-n 

where T is as defined above. Thus the B term can be rewritten as 

B = exp n^f(u, /x) — (N — n )4' ^ Lu, 1 — /x 

Now T satisfies 4/(0, n) = 0, 11/(0,/x) = 0, and, as in Talagrand (as well as van der Vaart and Wellner [22, 

pp. 460-461]), 


du 

d 2 4 

du = 1 - 4(u - 


4 (« - ( 1/2 - ! d ))‘ 


>4(l + 4(xx — (1/2 — /x)) 2 ) . 


Thus 


cf_ 

du 2 

= n 


n4/(u, fi) + (N — n)4> ^ = xx, 1 — B 
+ (N — n)- 


4(///) 2 


1 — 4(xx — (1/2 — /x)) 2 


4(|tx-(/x-l/2)) 

> 4n (1 + 4(xx - (1/2 - /x)) 2 ) + 4(7V - n)(///) 2 ^1+4 - (/x - 1/2)) j . 


Integration across this inequality yields 


0 

du 


n4f(u, /x) + (IV — n)T ( Lu, 1 — /x 


> 4?x I xx + -xx 3 ] + 4(1V — xx) I = ) xx + - I = | xx 


1 /t) „3 

3 V/, 


= 4 ( xx + (N — n) 


n 


N — 


+ 5 


4 nN 


N-n 3 


xx + —n 1 + 


(N - n) 3 


(4.4) 
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Here we used 


r u 4 

(1 + 4(u - (1/2 - n)) 2 )dv = u + -0 - (1/2 - n)Y 


= u+-[u-{ 1/2- M ) 3 -(-(1/2-m)) 3 ] 

= u + g [( u — ( 1/2 — m )) 3 + ( 1/2 — m ) 3 ] 

— u + 3 [( u / 2) 3 + 0 / 2 ) 3 ] 

= u + (l/3)u 3 

where the inequality follows since the function fj (u — /3) 3 + /3 3 is minimized by /3 = u/2: with h u ((3) = 

0 — /?) 3 + P 3 , 

h' u (P) = 30 - /3) 2 (-l) + 3/3 2 = 3{/3 2 - (/3 2 - 2 M/ 3 + u 2 )} 

= 3u{2/3 — it} = 0 if P = u/2, 

while /i"(/3) = 6u > 0. Similarly, 


£ ^l + 4 (L-0-l/2)) ^du = u+^L-0-l/2)) £ 


= u + 


4/ 

3/ 


4 / 

3/ 

. , 4 / 

- u+ 3/ 


ju -( n - 1 / 2) 3 - (-0 - 1 / 2)) 3 

=«- (M-l/2)) +0-1/2) 3 


1 7/ 


+ 3 V/, U 


72 

2 


orr 3 


/2 


Integrating across (4.4) yields 


n4'(u,/i) + (A' - — n)\V ^=u, 1 — > 

Thus the 1? term in (4.3) has the following bound: 

2nN 


2nN 
N — n 


+ (1/3)ti ( 1 + 


( N-n ) 3 


B < exp — 


N - 1 


u exp - - 1 + 


n 


(N — n) 3 


(4.5) 
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We next analyze the C term in (4.3). We have 


C = 


exp 


(4 


I 1 


12 D ' 12(A/-D) ' 12n ' 12(iV-n) 


exp 
= exp 

• exp 

= exp 


( _w 

Y 12k- 


12fc+l ~ 12(Z>-fc)+l ^ 12(n-fc) + l ' 12(N-D-(n-k)) + l ' 12W+1 


1 


1 


12 D 12 (D - fc) + 1 

rj__ 1 


V 12n 12 k + 1 

— 12 fc +1 


exp 


exp 

1 


1 


1 


12(TV-D) 12(7V — D — (n — k)) + 1 


1 


12(N-n) 12(n — k) + 1 


exp 


1 


127V + 1 


( [12D] [12(D — fc) + 1] 


exp 


—12 [n — fc] H- 1 


([12(7V - D)\ [12([7V — D\ — [n — k]) + 1] 


/1 — 12 {n-k)\ ( 1 - 12(7V - 2n + k) \ f 1 

' 6XP 12(12*+ l)ra ) 6XP \ 12(12(n — k) + 1)(7V — n)) 6XP V _ 12iV + 1 


< 1 


(4.6) 


where the final inequality follows since k £ [[n/n],... ,n — 1] and n < D < |_A7*/2J which implies that each 
exponential argument preceding the inequality is negative. This gives a bound of 1 on the product. As the 
A term in (4.3) is already in the claimed form, combining (4.5) and (4.6) proves the claim. □ 

Next we develop an upper bound for hypergeometric tail probabilities. This bound is similar to that 
discussed by Feller for the binomial [5, pp. 150-151]. To our knowledge this result is new. 

Lemma 7. Suppose S U} d,n ~ Hypergeometric(n, D , TV), TV > 4 and 1 < n, D < TV — 1. For k > ( nD)/N, 
we have 


P{S n ,D,N > k) < P(S ntDiN 


f k(N — D — n + k)\ 
*>(— Ni^kD — )' 


(4.7) 


Proof. Suppose first that n < D and k = n. Then (4.7) becomes 


P {Sn,D,N > n) < P (S n ,D,N = n) 


i(N — D — n + n) 
Nn — nD 


P {S n ,D,N = n) 


( n(N-D) \ 
\n(N~D)) 


= P (S Uy D,N = n) . 


Since P{S u ,d,n > n) = P (S n ,D,N = n), the result holds in this case. Next, suppose D < n and k = D. 
Then (4.7) becomes 


P{S n ,D,N > D) < P(S n ,D,N = D ) 


fD(N-D~n + D) 


\ 


ND-nD 


P(Sn,D,N = D) 


( D(N — n) \ 
\D(N — n)) 


P(Sn } D,N = D) . 


Since P(S n > D) = P{S n ^,n = D), the result holds in this case too. 

If (n, D, TV) is a population such that [(nD)/N\ + 1 = n A D, we are done. Supposing this is not the case, 
let [(nD)/N\ + 1 < (j — 1) < j < n A D. Assume the result holds when k = j. We will show this implies the 
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result holds for k — j — 1. We have 

P ( S n: D,N > j ~ 1) = P (S n ,D,N = j — 1) + P (S n ,D,N > j) 
< P (S n ,D,N = j - 1 ) + P (S n ,D,N = j ) 


j(N - D -n + j) 
Nj — nD 


(by induction hypothesis) 


= P ( S n ,D,N = j — 1 ) 
= P{S n ,D,N=j- 1 ) 
= P {Sn,D,N = j - 1 ) 


1 + 
1 + 
1 + 


P {S n ,D,N = j ) 

P (S n ,D,N = j - 1) 

{D-j+ l)(n-j+ 1) 
j(N - D-n+j) 
(D — j + l)(n — j + 1) 


j(lV- D-n + j) 


N j — nD 
j(N - D-n + j) 
Nj — nD 


Nj — nD 

Under the current assumption, the right-hand side equals 

' (j - \ ){N-D-n + j-l) 

N(j - 1 )-nD 

\ , (D-j + l)(n-j + l) 


P ( S n ,D,N = j — 1) 
so we see it is enough to show 

'(j - 1)(N -D-n + j - 1) 


N(j - 1 )-nD 

Combining terms and simplifying, we find this equivalent to showing 

N(D-j + l)(n-j + l) 


Nj — nD 


> 0 . 


(Nj - nD)(N(j - 1) - nD) 


> 0 . 


Since we assume \_(nD)/N\ + 1 < (j — 1) < j < n A D, we see that each term in parentheses in the fraction 
is non-negative. In particular, since j > [(nD)/N\ + 2 > ( nD)/N + 1, we have 


N(j - 1 )-nD> N((nD)/N) - nD = 0 . 


Thus, the expression is non-negative. This implies the claim. 

We next prove a technical lemma. 

Lemma 8. Fix N > 4. Suppose that n < D < [N/ 2J and that 7 := (N — n)/n. For all triples 




n+ 1 1 
N ’ 2 



x (1, 00 ) 


we have 


p(l - p,)(u + b)( 7(1 ~ M) + m) < 1 (u + (1/2))(7(1 - (1/2)) + u) 
(l-u- n^n-u) ~4 (l-u-(l/2))(y(l/2)-u) 


□ 


(4.8) 


We pause to outline the strategy used to prove this statement, since the proof requires a rather detailed 
algebraic argument. We break the quantity into two functions, / and <?, the second of which, g, is parabolic 
on g, G [(n + 1 )/N, 1/2]. We demonstrate that / is maximized at ji = 1/2. We do this by obtaining the only 
root which falls in the interval, determining that it yields a local minimum, and finally showing the function 
is larger at the upper boundary of p = 1 / 2 . 
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We then show that g has a local maximum in the interior of the interval (for 0 < u < 1/2). Using the 
quadratic g function as a scaling function, we then define an upper envelope to the function of interest in 
terms of /, along with a second function that agrees with the function of interest at g = 1/2. By defining 
the two new functions in terms of / (scaled by positive numbers, which are obtained at fixed-points of g), 
we are still able to claim these functions are maximized at /i = 1/2. 

We then demonstrate the function of interest increases monotonically between the value of /.t where it 
intersects its envelope and g = 1/2. We finally show that at the right endpoint of g = 1/2, the quantity of 
interest exceeds its envelope at the left end-point. This will prove the claim; the details now follow. 

Proof of Lemma 8. With the previous comments in mind, define the following functions: 

g(l- g) 


f(v) : = 7, w 

(1 - u - W (7M - u) 
and g(g) := (u + g) (y(l - g) + u) 


(4.9) 


Note that the product f{g)g{g) gives the quantity on the left-hand side of (4.8). We first analyze f(g). 
Taking its derivative, we find 


n») = 


t(( 7 - 1)M 2 + 2/z(l - u) - (1 - u)) 


(1 — u — g) 2 {u — 7/Li) 2 
Seeking critical points, we find f'{g) has the following roots: 

± y/(l — lt)(7 — u) + U— 1 

7-1 

Since g G (0,1/2), only the positive root is of potential interest. Since 7 > 1 under the current restrictions, 
we have 

a/(1 - it) (7 - u) + u — 1 a/(1 - u) 2 + u - 1 


Additionally, we can see 


7 — 1 7 — 1 

- U)(j - u) + U — 1 


= 0 . 


7-1 


1 

< - 
“ 2 


since, after algebra, it is equivalent to showing 


0 < 


(7 — l) 2 


which follows under the assumptions. A similar argument shows that the corresponding root with the negative 
radical is always negative, and therefore does not affect the current investigation. Next, differentiate again 
and evaluate the second derivative at the root. We then find 


/'» 


[2(7 — 1) 4 (1 - u)u( 7 — u)] (j 2 + l) u + 27^/(1 — u)( 7 - u) - 7 2 — 7 


7-1 


Vi 1 - u )(7 - u ) - 7(1 - u) 
a{u, 7)] [b{u, 7)] 


n 3 


1 - u )(7 - u) — 1) + 


’ [c(m,7)] 3 M(u ; 7)] 3 


We next show that this quantity is positive for any (it, 7) € (0,1/2) x (l,oo). It is clear that a(u, 7) is always 
positive under the current assumption, since each term in the product is positive. 
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We next claim b ( u , 7) < 0 for all ( u , 7) € (0, 1/2) x (l,oo). This claim is equivalent to showing 

2 jy /(1 — it)(7 — u ) < 7 2 (1 — 11) + (7 — u ) . 

Since both sides are positive, we square both sides and simplify to find that the claim is equivalent to showing 

0 < (7 — 1) 2 (—7 + 711 + it) 2 . 

As this last claim follows for any admissible pair, we conclude b ( u , 7) < 0 for all (11, 7) € (0, 1/2) x (l,oo). 
We next show that c(it, 7) < 0 for all (u, 7) £ (0, 1/2) x (l,oo). This claim is equivalent to 

(1 — u)( 7 — u ) < 7 2 (1 - u ) 2 


which, after expanding and re-arranging, is equivalent to the claim 

0 < (7 — 1)(1 — it)(7 — 7 u — u) 


for all (it, 7) £ (0,1/2) x (1, 00 ). On this set, it is clear 7 — 1 and 1 — u are positive for any admissible pair. 
Hence, we need only show (7 — 7 u — u) > 0 on this set. But this is equivalent to claiming 7(1 — u) > u 
for any pair on this set, which is true because 7 > 1 and it < 1/2. Thus we conclude 0(11,7) < 0 for all 
(u, 7 ) £ (0, 1 / 2 ) x (l,oo). 

We finish this sub-argument by showing d(u, 7) > 0 for (11, 7) £ (0, 1 / 2 ) x (l,oo). This claim is equivalent 
to 

7 \J (1 — u)(7 — u) >7 — it 

for all admissible pairs. Since both sides are positive, we square and simplify to find the claim equivalent to 

p(u) := q 2 — 7 2 u — 7 + u > 0 . 


Viewing the left-hand side as a function of u, we differentiate to see p'(u) = 1 — 7 2 < 0 for any choice of 
7 > 1 . So, p(u) decreases in u for any 7 > 1 Hence 


p(u) > 7 2 



7 2 — 27 + 1 
2 


(7-l) 2 
2 


> 0 . 


Thus we conclude d(u, 7) > 0 for (it, 7) £ ( 0 , 1 / 2 ) x (l,oo). 

To summarize: we have shown that for all (it, 7) £ ( 0 , 1 / 2 ) x (l,oo), 0(11,7) > b{u, 7) < 0 , c(u, 7) < 0 
and d(u, 7) > 0 . This means that 






[a(u, 7)] [6(0,7)] 
[c(it,7)] 3 [d(u,7)] 3 


Therefore we have found a local minimum of f(p) that falls in [(n + 1)/N, 1/2]. Therefore, the maximum 
must be achieved at one of the endpoints. 

We next show that the maximum is in fact achieved at p = 1/2. To do this, we compare the difference. 
Plugging in the definition 7 = (N — n)/n, and simplifying, we find: 

/A /n+i\_ nu(N-2n-2)(nu(N-2n-2) + N) 

\2/ \ N ) (1 — 2u)(V(l — u) — n — 1)(N — 2nu — n)((n + l)(iV — n) — nNu) 

Each term in this expression is positive for all it £ (0,1/2) and hence the entire expression is positive. To 
see this, first observe that the restriction n < D < |_7V/2J means that the maximum value n can attain is 
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L-/V/ 2 J — 1 . This implies N — 2n — 2 > 0 . Since we also restrict u £ ( 0 , 1 / 2 ), we also have (N(l — u) — n—l) > 0 
and (N — 2nu — n) > 0 . Finally note 

(n + 1 )(1V - n) - nAHt > (n + 1 )(N - n) - ^ = n ^ N -^ n ~ 2 ) + JV > 0 . 

We conclude that /(/x) is maximized at /i = 1/2 over all choice of (m,7) £ ( 0 , 1 / 2 ) x (l,oo). 

We next consider the function g(p), defined in ( 4 . 9 ). We write it again, its first two derivatives, and its 
critical point p* for subsequent discussion. As this function is much simpler than f(p), we present these 
quantities without comment. 


= (u + p) (7(1 - p)+u) , 
g \ g ) = -27M + 7-7 u + u , 
g '\ g ) = -27 , 


and 


* _ 7(1 -u) + u 
27 


Since g"(p) < 0 for any choice of (u, 7) £ (0,1/2) x (l,oo), we see that p* is a local maximum. For any 
7 > 1, we also see the critical point decreases for u £ ( 0 , 1 / 2 ), from a value of 1/2 at u = 0 to a value of 
(1/4)+ ( 1 /( 47 )). As 7 + 1 00, this approaches 1/4 asymptotically. Hence for any (it,7) £ (0, 1 / 2 ) x (l,oo), the 
maximum of the function is attained for /i £ (0,1/2). Since we are ultimately interested in understanding the 
product /(/r)g(/r), we next show that the maximum of g occurs at a value greater than the local minimum 
of /. We do this by comparing their difference to zero. The claim 


7(1 — u) u 


7/(1 — it) (7 — u) + u — 1 

[ 27 J 


7-1 


is equivalent to the claim 

(7 - 1)(7(1 — u) + u) + 27(1 - u) > 277/(1 — u ){7 — u) . 


Both sides of this inequality are positive. So, we square them and simplify to find that the claim is equivalent 
to the claim 

(7 - 1) 2 (7(1 — u) — u ) 2 > 0 . 

The claim follows by the final form, since the square each quantity positive. We now define three related 
functions. 


ue{fi) 

and ep(fj,) 


■= 9 


7(1 — u) + ' 


27 


/(m) = 


(1 — m ) m (7 + "fu + u) 2 
47(1 — ii — u)("fp — u ) ’ 


:= g(g)f(g) = 


:= 3 ( 1 / 2 )/(m) = 


gil - g){u + fj,)( 7(1 -g,)+u) 
(l-u-/i)(7/x-u) 

(1 — /x)/x(l + 2 u)(7 + 2 u) 


4(1 — g — u)( 7 /z — u) 


First notice that t(/i) is the quantity of interest, which we wish to show is maximized at g, = 1/2. As defined, 
the function ue(n ) is an upper envelope of t(n), with agreement at /i = ( 7(1 — u) +u)/{ 27). ep(fi) is defined 
so that ep(l/2) = t( 1/2), that is ep agrees with t at the end-point of the /r-interval. Consider the behavior 
of t(p) on p £ [( 7(1 — u) + u)/{ 27), 1/2]. We have 


t\p) = l- 2 p + 


(7 + l)u 2 (7/i 2 — /x 2 -F 2/x — 2 pu + u 
(1 — p — u) 2 ( 7// — u ) 2 


1 ) 


(4.10) 
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Since // < 1/2, the sign of may be determined by the behavior of the third term in the numerator. We 
consider its behavior separately. Let 


a{jd) := 7/z 2 — ji 2 +2fi — 2 [m + u — 1 , 
a'(n) = 2(1 - u + n (7 - 1)) > 0 , 

7(1 - u)+u \ = (7 — l)(a — 7(1 — a)) 2 
27 ) 4 7 2 


We see then that a(/x) will be non-negative for /z € [(7(1 — u) + u)/(2j), 1/2]. Therefore, t'(^) > 0 on the 
same interval. Hence, t(n) is increasing on the same interval. Finally, consider the difference 


e P ( 1 / 2 ) — ue 

f 2 u 2 (N — 2n)(N — 2n — l)(2n(N — n — 1) + IV) 

iVu \+Nu(N - n)(N - 2n - 3) + (N - n) 2 (N - 2n - 2) - 4 nNu 3 (N - 2n - 1) 

4(1 — 2 u)(N — n)(N — n — 2nu)(N(\ — u) — n— 1)(N — n + nN(l — u ) — n 2 ) 

where we have again substituted the definition 7 = (N — n)/n. We will now argue that this quantity is 
positive for all n € {1,..., \_N/2\ — 2}. This is sufficient to demonstrate t(n) is maximized at /i = 1/2, since 
we are supposing n < D < |^iV/2J. This restriction is necessary to handle the sign-change implicit in the 
term (N — 2n — 3). There, for n = |_A^/2J — 2 it equals (for integer values of N/2) 1, while it flips signs for 
N/2 — 1. However, this sign-change is not problematic since our assumptions imply atn = 7V/2 — 1 that 
D = N/2, which is the value we are trying to demonstrate maximizes t(fi). 

We will demonstrate positivity by analyzing the terms in the expression. For simplicity, we will assume 
N/2 is an integer, though the same analysis will hold for odd values of N. We will consider some of the 
denominator terms first. We have, using the assumptions, 


(4.11) 


(N — n) + (nN( 1 — u) — n 2 ) > + N > 0 . 


We also have 


N — n — 2 nu > N — 2n> N — lY + 4>0 . 


So we see all terms in the denominator are positive for any choice of ( u , n). Hence, it is enough to show that 
under our assumptions 


z(u) := 2u 2 (N-2n)(N-2n-l)(2n(N-n-l) + N) + Nu(N-n)(N-2n-3)-4nNu 3 (N-2n-l) > 0 . 


First viewing the left-hand-side as a function of u, we observe the following computations: 

z'(u) = 4 u(N - 2n)(N -2n- 1)(2 n(N -n-l) + N) + N(N - n)(N - 2n - 3) - 12 nNu 2 (N - 2n - 1) , 
z"(u) = 4 (N - 2 n)(N -2n- 1)(2 n(N ~n-l) + N)- 24 nNu(N - 2n - 1) , 
z'"(u) = -24 nN(N - 2n - 1) < 0 . 

From the third derivative, we see z"(u) is decreasing in u. Since z"( 0) = 4 (N — 2n — 1 )(N + 2 n(N — n — 
1))(N — 2n) > 0, we calculate the value of the second derivative at u = 1/2 to find 


z"(u) 


t=l/2 


4(7V -2 n- 1)(4 n 3 + 4 n 2 + 2 nN 2 + N 2 - 6 n 2 N - 7nN) =: 4(N - 2n - 1 )<j>(n) , 
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where we define the function </>(n) in-line. We analyze the sign of 4>(n) for n £ {1,..., |_AT/2J — 2}. Treating 
n as continuous temporarily, we differentiate twice to find 

<f>"(n) := 24 n- 127V + 8 . 

Since we assume n £ { 1,..., [7V/2J — 2}, we see 

4>"(ri) = 24n - 127V + 8 < 24 - 2 j - 127V + 8 = -40 < 0 . 

This implies 4>{n) is concave in n. Evaluating at the admissible endpoints, we find 


and <j>((N/2) — 2) = 


</>( 1) = 8 + 7V(37V - 13) , 
TV 2 + 127V - 32 


For TV > 4, both of these expressions are positive. By concavity we conclude 4>(n) > 0. Therefore, we have 
that 




>0 , 


l= 1/2 


and so we conclude z"(u) > 0 for all u £ (0,1/2]. But since 


z'(u) 


u —0 


N(N — n)(7V — 2n — 3) > 0 , 


we infer that z'{u) > 0 for all u £ (0,1/2]. Finally, since z{ 0) = 0, we conclude that z(u) > 0 for all 
u £ (0,1/2]. But this implies that 


ep (1/2) — ue 


71+1 

TV 


> 0 


(4.12) 


Therefore, we can define the following function 


ue(fj,) if fi £ 


n +1 7(1—m)+m 
N ’ 27 


maj(/x) := 


t(p) if fi £ 


I 7(1 — u)+u 1 
l 27 >2 


Observe that for all p £ [(n + 1)/TV, 1/2], we have maj(/Lt) > t(n). Additionally, we know that maj(/x) is 
maximized at p = 1/2 : the argument following (4.10) shows for /i such that maj(/r) = i(/x), maj(/Lt) strictly 
increases; the argument following (4.12) shows maj(p.) increases to its maximum on the interval. Finally, 
since we know maj(l/2) = ep(l/2) = i(l/2), we conclude t{p) is maximized at /i = 1/2 for all choice of 
(u,j) £ (0,1/2] x (l,oo). This completes the proof. □ 

We are now ready to prove (2.4). 
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Proof of Theorem 1. Pick A, k > 0 such that k = y/nX + nfj ,, k > n(D/N). We then have 
P (y/n(X — yi) > A) 

/ k(N — D — n + k)' 


< 


< 


P[Y,Xi = k 


\i =1 




Nk — nD 

y/D(N — D)n(N - n) 


by (4.7) 


V2n y/k(D - k)(n - k)(N - D - (n - k))N \ Nk - nD 


k(N — D — n + k) 


exp - 


= W 


2 nN 
N — n 
2 nN 


u 2 exp - - 1 + 


(N — n) 3 


exp - 


N — : 


u 2 exp - - 1 + 


(. N-n ) 3 


by (4.2) 
(4.13) 


Recall that u := ( k/n ) — ( D/N ) in the previous bound. Define / := n/N, f := 1 — f N = (N — n)/N, 
fj, := D/N , and Furthermore, define the ratio 


We may then write: 


D 


f N-n 
f n 


N 


N 


D — k = [ N— — n— = n ( —u -= n — li — u — u) = n Hu — u ) . 

\ N n J \n nj \n ) 

Similarly we have 

N — n — (D — k) = N — n — n ('m — u) = n( 7 — 7 /i + u) = n(7[l — n] + u) . 
Using these parametrizations, we may write 

1 / 


[4 = 


D(N — D)n(N - n) 


/ k(N - n - (D - k)) \ 


y/ 7 ^ y k(D — k)(n — k)(N — D — (n — k))N \Nn((k/n) — (D/N)) J 

1 jD(N - D)n(N - n)k 2 (N - n - (D - k)) 2 (\ 
v^Y k(D — k)(n — k)(N — n — (D — k))N 3 n 2 

J (N-n) jpf D\ (*) (N-n-(D-k)) 

V 27 tuNu 2 Y N \ N ) (1 - X) (D-k) 


I (N - n) 

2irnNu 2 


lli(l- n) 


(u + y) 7 ( 1 -M)+' 
(1 — u — fj.) 'yn — u 


< 


(N-n) / 1 (u + (1/2)) (7(1 — (1/2)) + u) 
27 rnNu 2 V 4 (1 — u — (l/2))(7(l/2) — u) 


(4.14) 


with the last inequality following by (4.8) established in Lemma 8. Observe under these parametrizations 
u = X/y/n. Hence, if we use (4.14) to provide an upper bound for (4.13), substitute X/y/n for u, and then 
simplify, the claim is proved. □ 
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Some of the machinery developed in the preceding lemmas will be adapted to prove (2.5). The argument 
follows. 

Proof of Theorem 2. Suppose now that 1 < n < D < N—l. We consider k such that OVn+D—N < k < n. 
The decomposition of a Hypergeometirc probability into A, B 1 and C terms stated in (4.3) still applies. For 
k > n(D/N ), the bound on the B term in (4.5) still holds. Thus we may write 


„ , 2 nN , 

B < exp- u‘ 

~ ' N — n 

2 n n 
TTT" 

1 N 


exp - - 1 + 


(N - n) 3 


= exp 


ex P (-^« 4 ) ex P exp 


3(N-n) 3 


(4.15) 


Also recall we showed that C < 1 at (4.6) when n < D < N/2. In fact, the expression at (4.6) shows C < 1 
under the current assumptions. When n < N/2, all exponential arguments may be determined to be negative 
by inspection. When n > N/2, the only fraction whose sign is unclear is 

1 - 12(N -2 n + k ) 


12(12(n — fc) + 1)(N — n) ' 

However, this remains negative under the current assumptions since n > N/2 implies k > n + D — N. 
Therefore, N + k > n + D and so TV + k — 2n > D — N > 0. We thus conclude C < 1. Here though, we 
provide a new analysis of the A term under the current assumptions. 

Case 1 

First restrict k so that /uq < — < 1 — We then have 


A = 



D(N - D)n(N -n) 
k{D — k)(n— k)(N — D — (n — k))N 


Hfl _ D\(l _ Hl) 
nA ivA 1 N> 


±{p-k){l-*-){N-D-n + k) 


( 1 / 4 ) (1 — tpo) 


< 


N 


( 1 / 4 ) (1 ~ ipo) 


fi 0 (D - n + n i f^)( l f)(N - D - n + nfi 0 ) ^2nn\J Mo(™if)(if )(wh>) 

2(1/4)(1 — V’o) < 1 vOWo) 


tth* 




71 To 


(4.16) 


Combining (4.16) with (4.15) and (4.6), we have the bound 

2 n 


ora < k* 

< —= exp 


(N\ 


1 - IL 

1 N 


where 


ex P (~^ u4 ) exp ( 

\/(l ~ V’o) 

VW 27r Mo 


12 


u I exp 


3 (AT — n) 3 


K c i - 


Case 2 

Next, suppose that 1 — ^ < ^ < 1. This implies that 


k D no D no no 
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We can bound the A term by 

1 I D(N - D)n(N - n) 

•/27T y k(D — k)(n — k){N - D - (n — k))N 

_ N I 77(1 ~ 77X-*- ~ 77) 

y/2n y ^ (D — k)(n — k)(N — D — n + k) 

< JL / (1/4) (1 — V ) o) 

“ V2n\j (1 - yf )(-D — n+ 1 )(n - n + 1 )(N — D — n + n{ 1 - ^)) 

< JV_ / (1/4)(1 — V’o) _ n 7T / (1/4)(1 — ^ 0 ) 

- s/2n^ (l-fKnil-f)) Vn\ 2tt( 1 - f ) 2 

^ / (1/4)(1 — = n / (1/4)(1 — ^o)~ 

- Vn V 2 tt( 1 - f ) 2 Vrr V 2^(1 - f ) 2 ' 


Taking the exp (—y^u 4 ) term from (4.15) we have 


(-(H 


nexp ( — — w ) < nexp 1 — ^2 ' ~ 


(fy 




This is maximized at 


and so 


192 

4" ’ 

Mo 


/ to 4 \ 192 

nex P - “• 

V 12 / uSe 


W 

Combining the remaining terms in (4.5) together with this bound of the A term and the C bound of 1 yields 


(?)(»:?) . k, 


O 


< 


exp - 


2 n 


1 - 


N 


(-54 


u exp 1 ——u exp — 


3 (7V-n) 3 _ 


where 


K C 2 = 


(1/4) (1 — V'o) 
2^(1 -f) 2 



Case: k = n 

When k = n there are only two binomial coefficients to consider in the hypergeometric probability. 
Therefore, we must derive a new bound via Stirling’s formula. Doing so yields 


Q( N ~ 0 D ) Dl(N-ny. 


O 


( D-n)\N\ 

\d{N -n) D d {N - n)^-") 
y {D-n)N (D~nY D - n )N N 
=: A'B'C' . 


< 


exp 


( 1 


+ 


\ 12Z? 12(N — n) 12(D-n) + l 121V + 1 


We can bound C' by 
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C' = exp 


= exp 


< 1 


12(N — D) +1 12(N-D) + 1 \ 

(12D)(12N + 1) ~ (12(N — ri))(12(D — n) + 1)) 

[12(JV - D) + l]([(12(iV - n))(\2{D - n) + 1)] - [(12D)(12iV + 1)]) 
[(12D)(12iV + l)][(12(iV - n))(12(£> - n) + 1)] 


with the final bound following since (N — D) > 0 and [(12(7V — n))(12(D — n) + 1)] < [(12D)(12iV + 1)]. 
Continuing with B' we have 


' d\ d ( N—n\ ( N ~ D ) 


B , = P D (N~n)( N ~^ = (§) (^y 

(D - n)( D ~ n )N N (p-n\ D ~ n 

\N-n ) 


N-n 

N 

D—n 

D 


D—n 


N-n 

N 


( N-D ) 


N—n—(D—n) 
N-D , 


where, as before, we have 


log 


= exp (—r + nlog(/r)) 

(. D-n)/D ' 


(. N-n)/N 


+ 1 - 


D — n 
N — n 


log 


[N-n-(D- n)\/{N - D) 
(. N-n)/N 


Using the previous analysis, we can write 

B' = exp (N — n)^ ^ yu, 1 — i^j + nlog(/z)^ = exp (—(IV — h ) 1 ? ( 7 , u) + nlog(l — «)) 

where we define 7 := jU, f '■= Jn = jj and / := Jn = 1 — In = N ^ n and use the equality u = 1 — n under 
the current hypothesis. Using the analysis from van der Vaart and Wellner, page 461, re-parametrized to the 
situation at hand, we obtain 

\f r (7, u) > 27 s + 7 4 /3 . 

We also have the bound via the Taylor expansion: 


log(l - u) = - 


OO u 

U K 


E 

Lfc=l 


< - 


7 k 
U K 


E 

.k =1 
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Hence 


B' < exp —(N — n) 


2 I ju) + 


(j“) 


41 


= exp |^—2 
< exp f—2 

= exp 
• exp ( 


u 2 - 


1 


N-nJ 3 \(N — n) 3 


+ nlog(l — u) 


+ nlog(l — u) 


2 1 
U — - 


N — nJ 3 \(N — n) 3 


2nN 


4 

u — n 


7 7/k 

E u 

T 

.k =1 


nu 

~~4 


N — n 

4 

\ 

exp 


3 \ (N-n ) 3 


u — nu + 


3 nu 2 

nu 3 

nu 6 

nu 7 \ 

2 

3 

6 

7 ) 


nu 


< exp — 


2 n 


i m 

1 N 


u exp 


1 


n 


3 \(N 

where the last inequality follows since for x > 0 


3 U eX P 


nu 

~ A 7 


exp 


nu" 

"T" 


™6 ™7 o 

«X/ «X/ J/ O Q 

I+ y + T + y-? >0 - 

For x > 0 this polynomial has a global minimum at 0 and local minimum at x « 0.851662 with a value of 
approximately 0.0796078. Finally we have 


A' = 


D(N-n) n , n /(1 - /x 0 )(l - ipo) 


< 


{D — n)N y/nV(D-n)% ^/n 


V’o 


where the final inequality uses the fact that D — n > 1 in this case. Taking the expression exp from 


the bound on B ', and observing u= 1 - y > /<o we have 


M 5 

nexp ( ) < n exp ( --^n ) < -4- 

W 


since xe x < e 1 for x > 0. Combining the bounds on A!,B ', and C", we have shown 




/£>\ /'Af-D'i 
0 / 


C) 


^ K c 3 

< —j= exp 
\ n 


2 n 


1 - 


AT 


u exp 


1 


3{(N^) U r XP C“4 


where 


K = / (! — Mo)(l — -0o) f 5 


V’o 


VMoe 


Hence if we set K\ = max(AT c i, K c ^) we have the bound 


(£)("=f) < Kr 
— 7n \— ^ ~r exp 

in) ^ 


2 n 

l - 2L ' 

L N 


u exp 


1 

3 \ (N-n ) 3 


u I exp 


nu 

'~4~ 


(4.17) 
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Plugging in the definitions k = y/nX + n/i and u = k/n — [i 
P ( it, Xi = k ) = P (v^(^ - Ai) = A) 


i / \ 3 \ 4 N 

If n \ A 


A J 


K\ ( 2A 2 

-^ eXP r^li eXP l~3li^J n) 6XP V 4n 


This gives inequality (i). To obtain inequality (ii), define, for any n,N pair subject to our conditions, 


aw= GAO i 2 + (a( 


1 ( n V , M 4 2 , , 4 

—- + - s=: as + 6s 

N-n 4 


with a, 6 > 0 since N > n. Hence h is convex. Therefore, as in the Talagrand argument, we also have 
h(x) > h{u) — (s — u)h'(u) for all s. Also for 0 < s < 1 we see 6 /(s) = 2as + 46s 3 has linear envelopes 

2 ax < /i'(s) < (2a + 46)s . 

Let 0 < t < A < y/n. Let k 0 = \n + y/nt] = \ [n/x + y/nt] . Using the bound at (4.17) we have 


V (")("=£) < y r K 1 ( ... 

22 7n\ - - 2^ 7 exp ~ nh ^ u > - n 


k>ko 


a 


k>k 0 

I<i 


k D 

-XT — U 

n N 


h! (u) 


= —L exp (—nh(u)) exp ([rat — (k — n/j)] h'(u )) 
Jn z —' 

v k>k 0 

exp ([rat — (A* — n/z)] h'(u )) 


< —2 exp (— nh(u )) 
y/n 

< —exp (— nh(u)) 
\Jn 

< —/= exp (— nh(u)) 
y/n 

exp (— nh (u )) 




1 — exp(— h'(u)) 
exp ([nu — (fc 0 — u//)] 6 /(u)) 


2 — exp ([nu — ■\/nt] [2a + 46] u) 


exp ( nu ( u - 7 - ) [2a + 46] 


y/nu 

where AT a {, is a constant that depends on a and 6 , and hence n and N, (which we further explain below), and 

A| K ah 

Ao = - . 

2 2 

We determine A' 0 & by observing 1 — e~ v > v/M for 0 < v < vq where M = M Vo = Vq/( 1 — e~ v ° ) together 
with 


h'(u ) < (2a + 46)zt < 2a + 46 


(since u < 1 ) 


4 / 4 / # 

+ I 1 + X I 7-X" I I = V N 


I - IL 

1 N 


q \ 1 _ 

o \ 1 N 


/ 4 ,4(1- V’o ) 2 _ 

< --b x-73- = v 0 . 

V’o 3 
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Therefore K a ^ can be taken to be M = u 0 /(l — e v °) or Mat = fjv/(l — e depending on how much 
dependence on n and N we leave in the bounds. Again by definition we have that 


2a + 4b = 

Therefore we have for all 0 < t < A 

( D ) ( N ~ D ) 

P(MXn-,)>t)=Y: [k )m k 

k>ko \n) 

K 


1 - — 
1 N 


+ i + l 


3 \N-n 


< _ exp l—nhiu)) exp { nu [ u - f= 

sjnu \ 


1 - 


N 


+ i + l 


3 \N-n 


K, 


A 


= — exp ( —nh ( —) ) exp ( A(A — t) 


1 - 


N 


+ 1 + X 


3 \N~ 


which gives inequality (ii). Inequality (iii) is obtained by setting t = A. This completes the proof. □ 
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